LOB phantom workaround: skip-lob-tables + clean test environment#18
LOB phantom workaround: skip-lob-tables + clean test environment#18
Conversation
Replace seq-by-seq comparison with replay_final_state() for LOB tables. Replays INSERT/UPDATE/DELETE ops into a final row state per event_id, then compares end results. This handles: - LogMiner merging INSERT + LOB_WRITE into single record (L2) - OLR emitting extra/fewer intermediate events (phantom undo #15) - Seq number shifting from misaligned event counts Non-LOB tables remain seq-by-seq (100% accurate, no merge issues). LOB known issues: 60 → 18 (irreducible phantom commits).
LogMiner merges INSERT + LOB_WRITE into a single INSERT (L2), while OLR emits them separately (INSERT then UPDATE). Comparing final op type (INSERT vs UPDATE) produces false diffs. Compare row_exists (bool) instead — both mean "row exists with these values". None values are NOT skipped — per Debezium docs, None means genuinely NULL while __debezium_unavailable_value means unchanged. These are semantically different.
Comprehensive analysis of phantom LOB events on Oracle RAC. OLR emits ~0.3% extra rows on LOB tables that don't exist in the database — from Oracle's internal LOB segment management. Investigated dataObj, bdba+slot, suppLogBdba, and opcode-based discrimination. None reliably distinguish internal LOB ops from user DML in raw redo. Only COMMITTED_DATA_ONLY (LogMiner) can filter them using Oracle-internal state. Non-LOB tables: 100% accurate across all test runs.
Allows users to skip emitting DML events for tables with LOB columns (CLOB/BLOB/NCLOB). Oracle's internal LOB segment management generates redo records indistinguishable from user DML, causing phantom events on RAC systems. This option lets users replicate LOB tables through a separate mechanism while using OLR for non-LOB tables.
The initial implementation only covered the multi-row (Multiple) code paths. Single-row INSERT and DELETE in Builder.cpp also need the check. Verified: OLR emits 0 FUZZ_LOB events with skip-lob-tables=1.
Switch both Debezium connectors from FileOffsetBackingStore (binary) to KafkaOffsetBackingStore. The `up` action now gets the current SCN from Oracle after table deployment and pre-seeds Kafka offset topics, ensuring both connectors start from exactly that SCN with no stale events from previous runs. - Debezium configs: KafkaOffsetBackingStore + snapshot.mode=recovery - fuzz-test.sh: seed offsets after Kafka ready, before Debezium starts - docker-compose: remove hostname for Kafka (incompatible with host network) - FUZZ-TEST.md: document clean run procedure (down/up/run/validate/db-check) - AGENTS.md: reference fuzz test documentation
📝 WalkthroughWalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant User as "Operator"
participant Compose as "docker compose"
participant Kafka as "Kafka (host)"
participant Debezium as "Debezium connectors (LM/OLR)"
participant Oracle as "Oracle RAC"
participant Consumer as "fuzz-consumer (SQLite)"
participant DBCheck as "db-check.py"
rect rgba(0,128,255,0.5)
User->>Compose: fuzz-test.sh up
Compose->>Kafka: start Kafka
end
rect rgba(0,200,100,0.5)
Kafka->>User: ready
User->>Compose: _seed_debezium_offsets(current SCN)
Compose->>Debezium: start connectors (seeded offsets)
Debezium->>Oracle: read redo (from seeded SCN)
Debezium->>Kafka: publish CDC events
Kafka->>Consumer: deliver events -> write SQLite
end
rect rgba(255,128,0,0.5)
User->>Compose: fuzz-test.sh run <N>
User->>Oracle: run workload (includes sentinel row)
Oracle->>Debezium: redo stream contains sentinel
Debezium->>Kafka: stream sentinel event
Kafka->>Consumer: consumer persists sentinel to SQLite
end
rect rgba(200,0,150,0.5)
User->>DBCheck: fuzz-test.sh db-check
DBCheck->>Consumer: copy SQLite, wait for sentinel
DBCheck->>Oracle: query ground-truth final rows
DBCheck->>User: report matched / missing / extra / diffs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/OpenLogReplicator.cpp (1)
779-785: Prefer constructor-initializingFormatwithskipLobTables.
Formatis now built in a partially initialized state and then mutated immediately after. Threading this flag through the constructor keeps the config object fully initialized at creation sites and makes future call sites harder to miss.♻️ Proposed cleanup
- Format format(dbFormat, attributesFormat, intervalDtsFormat, intervalYtmFormat, messageFormat, ridFormat, redoThreadFormat, xidFormat, - timestampFormat, timestampMetadataFormat, timestampTzFormat, timestampType, charFormat, scnFormat, scnType, unknownFormat, - schemaFormat, columnFormat, unknownType, userType, jsonNumberType, charsetOverrideId); - format.skipLobTables = skipLobTables; + Format format(dbFormat, attributesFormat, intervalDtsFormat, intervalYtmFormat, messageFormat, ridFormat, redoThreadFormat, xidFormat, + timestampFormat, timestampMetadataFormat, timestampTzFormat, timestampType, charFormat, scnFormat, scnType, unknownFormat, + schemaFormat, columnFormat, unknownType, userType, jsonNumberType, skipLobTables, charsetOverrideId);- USER_TYPE newUserType, JSON_NUMBER_TYPE newJsonNumberType = JSON_NUMBER_TYPE::AS_NULL, uint64_t newCharsetOverrideId = 0): + USER_TYPE newUserType, JSON_NUMBER_TYPE newJsonNumberType = JSON_NUMBER_TYPE::AS_NULL, + bool newSkipLobTables = false, uint64_t newCharsetOverrideId = 0): ... userType(newUserType), jsonNumberType(newJsonNumberType), + skipLobTables(newSkipLobTables), charsetOverrideId(newCharsetOverrideId) {}Also applies to: 901-905
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/OpenLogReplicator.cpp` around lines 779 - 785, The code reads "skip-lob-tables" into a local bool and mutates the Format object after construction; instead, add a skipLobTables parameter to Format's constructor and pass the computed value at construction time (keep the validation using Ctx::getJsonFieldU and the same ConfigurationException check), update all Format construction sites (including the other occurrence around lines 901-905) to supply the new argument, and update Format's constructor signature/initializers so the member is set via initializer list rather than via post-construction mutation.tests/dbz-twin/rac/fuzz-test.sh (1)
81-108: Robust offset seeding function, but topic creation error handling is opaque.The function correctly seeds Debezium offsets with the current SCN. However, suppressing all stderr with
2>/dev/nullon both topic creation (line 97) and producer (line 104) makes debugging difficult if something goes wrong.Consider at least logging a debug message or checking the exit code for the producer command, since a failed seed would cause connectors to start from the wrong position.
Optional: Add minimal error checking for the producer
# Produce seed offset - echo "${offset_key}|${offset_val}" | docker exec -i fuzz-kafka /opt/kafka/bin/kafka-console-producer.sh \ + if ! echo "${offset_key}|${offset_val}" | docker exec -i fuzz-kafka /opt/kafka/bin/kafka-console-producer.sh \ --bootstrap-server localhost:9092 \ --topic "$offset_topic" \ --property parse.key=true \ - --property key.separator='|' 2>/dev/null + --property key.separator='|' 2>/dev/null; then + echo "WARNING: Failed to seed $offset_topic" >&2 + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/fuzz-test.sh` around lines 81 - 108, The _seed_debezium_offsets function hides failures by redirecting stderr to /dev/null for topic creation and the kafka-console-producer; update it to capture and check the exit status of both docker exec commands and surface errors: run the kafka-topics.sh and kafka-console-producer.sh calls without swallowing stderr (or redirect stderr to a temp var), then if either command returns non-zero, echo a clear error message including the topic name, offset_key and offset_val (or captured stderr) and exit or return a non-zero status so callers can detect failure; specifically modify the docker exec calls inside _seed_debezium_offsets to preserve stderr, capture their exit codes, and add conditional logging/exit when those codes indicate failure.tests/dbz-twin/rac/validator.py (1)
284-303: LOB comparison silently counts differences without diagnostic output.For LOB tables, when
lm_exists != olr_exists(line 292-294) or when value diffs exist (line 298-299), the code incrementstotal_lob_knownbut doesn't log the specific event_id or nature of the difference. This makes it harder to investigate which specific LOB events are phantom.Consider adding optional verbose logging for LOB differences to aid debugging.
Optional: Add diagnostic logging for LOB differences
if lm_exists != olr_exists: + print(f"[LOB_KNOWN] {eid} existence mismatch: " + f"LM={lm_exists} OLR={olr_exists}", flush=True) total_lob_known += 1 total_validated += 1 else: diffs = compare_values(lm_state, olr_state, event_table, 'after') if diffs: + print(f"[LOB_KNOWN] {eid} value diffs", flush=True) total_lob_known += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/validator.py` around lines 284 - 303, When handling LOB tables in the block using replay_final_state and compare_values, add optional verbose diagnostic logging when lm_exists != olr_exists or when diffs is non-empty: log the event identifier (event_id), the lm_exists/olr_exists booleans, a short summary or count of diffs from compare_values and the lm_state/olr_state snapshots (or a trimmed diff) as well as the lm_recs/olr_recs context; gate this output behind an existing verbose/debug flag or logger so it only prints when enabled. Ensure you modify the branch where lm_exists != olr_exists and the branch where diffs is truthy (references: replay_final_state, compare_values, lm_recs, olr_recs, lm_state, olr_state, total_lob_known) to emit the diagnostic messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/builder/Builder.cpp`:
- Around line 806-809: The LOB-skip check (format.skipLobTables &&
!table->lobs.empty()) must be hoisted out of the left-hand branch so it always
prevents emitting DML for known LOB tables regardless of the fallback OR arms;
update the conditional around the insert/row-emission guard (the compound if
using schema, table, DbTable::isSystemTable, DbTable::isDebugTable,
table->matchesCondition and
ctx->isFlagSet(Ctx::REDO_FLAGS::SHOW_SYSTEM_TRANSACTIONS/SCHEMALESS)) to first
bail out when format.skipLobTables && !table->lobs.empty(), then evaluate the
rest of the predicate, and apply the same restructuring to the analogous
delete/update and multi-row guards referenced in the comment (the other
occurrences at the noted locations) so SHOW_SYSTEM_TRANSACTIONS or SCHEMALESS no
longer override the LOB skip and counted rows are correctly skipped.
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 461-475: The action_db_check() function invokes a missing script
tests/dbz-twin/rac/db-check.py via python3 "$SCRIPT_DIR/db-check.py", which will
cause runtime failure; either add and commit the db-check.py script implementing
the expected polling-and-sentinel-check behavior (reading SQLITE_DB path passed
via environment and re-copying the container DB), or change action_db_check() to
call an existing, committed validation utility or inline the check logic; update
references to SQLITE_DB, ORACLE_HOST and ensure SCRIPT_DIR points to the script
location so python3 "$SCRIPT_DIR/db-check.py" can execute successfully.
---
Nitpick comments:
In `@src/OpenLogReplicator.cpp`:
- Around line 779-785: The code reads "skip-lob-tables" into a local bool and
mutates the Format object after construction; instead, add a skipLobTables
parameter to Format's constructor and pass the computed value at construction
time (keep the validation using Ctx::getJsonFieldU and the same
ConfigurationException check), update all Format construction sites (including
the other occurrence around lines 901-905) to supply the new argument, and
update Format's constructor signature/initializers so the member is set via
initializer list rather than via post-construction mutation.
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 81-108: The _seed_debezium_offsets function hides failures by
redirecting stderr to /dev/null for topic creation and the
kafka-console-producer; update it to capture and check the exit status of both
docker exec commands and surface errors: run the kafka-topics.sh and
kafka-console-producer.sh calls without swallowing stderr (or redirect stderr to
a temp var), then if either command returns non-zero, echo a clear error message
including the topic name, offset_key and offset_val (or captured stderr) and
exit or return a non-zero status so callers can detect failure; specifically
modify the docker exec calls inside _seed_debezium_offsets to preserve stderr,
capture their exit codes, and add conditional logging/exit when those codes
indicate failure.
In `@tests/dbz-twin/rac/validator.py`:
- Around line 284-303: When handling LOB tables in the block using
replay_final_state and compare_values, add optional verbose diagnostic logging
when lm_exists != olr_exists or when diffs is non-empty: log the event
identifier (event_id), the lm_exists/olr_exists booleans, a short summary or
count of diffs from compare_values and the lm_state/olr_state snapshots (or a
trimmed diff) as well as the lm_recs/olr_recs context; gate this output behind
an existing verbose/debug flag or logger so it only prints when enabled. Ensure
you modify the branch where lm_exists != olr_exists and the branch where diffs
is truthy (references: replay_final_state, compare_values, lm_recs, olr_recs,
lm_state, olr_state, total_lob_known) to emit the diagnostic messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d745ab7-d85f-45f3-91a4-7aefd4325607
📒 Files selected for processing (12)
AGENTS.mddocumentation/LOB-PHANTOM-INVESTIGATION-REPORT.mddocumentation/json/4.format.adocsrc/OpenLogReplicator.cppsrc/builder/Builder.cppsrc/common/Format.htests/dbz-twin/rac/FUZZ-TEST.mdtests/dbz-twin/rac/config/application-logminer-kafka.propertiestests/dbz-twin/rac/config/application-olr-kafka.propertiestests/dbz-twin/rac/docker-compose-fuzz.yamltests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/validator.py
- Hoist skip-lob-tables check outside the || fallback path at all 5 emit/skip decision points in Builder.cpp. Previously SHOW_SYSTEM_TRANSACTIONS or SCHEMALESS flags would bypass the skip and still emit LOB table events. - Add missing db-check.py to the repository (was untracked).
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
tests/dbz-twin/rac/db-check.py (1)
314-315: Preserve falsy values in diff previews.
0and empty strings are currently printed as'None', which makes diff output misleading.💡 Suggested fix
- db_s = str(db_val)[:40] if db_val else 'None' - cdc_s = str(cdc_val)[:40] if cdc_val else 'None' + db_s = str(db_val)[:40] if db_val is not None else 'None' + cdc_s = str(cdc_val)[:40] if cdc_val is not None else 'None'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/db-check.py` around lines 314 - 315, The diff preview treats falsy values (0, empty string) as None because the ternary uses truthiness; update the conditions for db_s and cdc_s to check for None explicitly (use "if db_val is not None" and "if cdc_val is not None") so that 0 and '' are preserved, then convert to str and slice as before (reference variables db_val, cdc_val and the preview vars db_s, cdc_s).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/rac/db-check.py`:
- Around line 8-17: The docstring promises a --watermark-scn CLI flag and
WATERMARK_SCN env var but main() never reads them; update main() to parse a
--watermark-scn argument (use argparse) and to read
os.environ.get("WATERMARK_SCN") as a fallback, prefer the CLI value over the env
var, convert the value to an integer (SCN) and use that SCN instead of always
querying Oracle for the current SCN; ensure the code path that currently queries
Oracle for the SCN only runs when neither the CLI flag nor WATERMARK_SCN is
provided so behavior matches the documentation.
- Around line 217-224: The current per-table except block around cursor.execute
for table_name only logs a warning and continues, which can hide failures;
update the exception handling in that try/except to either raise the exception
or set the overall failure flag (any_issues=True) before continuing so a failed
SELECT does not produce a false “ALL MATCH”; specifically modify the except
Exception as e block that prints the warning for cursor.execute to also mark
any_issues=True (or re-raise) so the calling check reflects the error for
table_name instead of silently skipping it.
- Around line 90-101: refresh_db currently runs docker cp commands via
subprocess.run and ignores failures, allowing stale SQLite snapshots to persist;
update refresh_db to check each subprocess.run result (calls that copy fuzz.db,
fuzz.db-wal, fuzz.db-shm from container 'fuzz-consumer' to sqlite_path) and fail
fast on non-zero returncode by raising an exception or aborting (include the
stderr output in the error). Ensure you reference the subprocess.run invocations
in refresh_db and validate returncode for each call, handling errors immediately
rather than silently continuing.
- Around line 115-127: Replace the try/except block that calls sqlite3.connect
with a with-statement using sqlite3.connect(sqlite_path) as conn and keep
conn.row_factory set inside that context, then call has_sentinel, get_max_scn
and get_event_count (for 'lm_events' and 'olr_events') inside the with so the
connection is always closed; also replace the broad except Exception with a more
specific except sqlite3.Error (or multiple specific sqlite3 exceptions if
desired) to handle DB errors and still sleep/continue on failure.
---
Nitpick comments:
In `@tests/dbz-twin/rac/db-check.py`:
- Around line 314-315: The diff preview treats falsy values (0, empty string) as
None because the ternary uses truthiness; update the conditions for db_s and
cdc_s to check for None explicitly (use "if db_val is not None" and "if cdc_val
is not None") so that 0 and '' are preserved, then convert to str and slice as
before (reference variables db_val, cdc_val and the preview vars db_s, cdc_s).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8b416dda-2d62-4826-a245-38657e0f62b4
📒 Files selected for processing (2)
src/builder/Builder.cpptests/dbz-twin/rac/db-check.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/builder/Builder.cpp
- Remove stale docstring advertising unimplemented --watermark-scn flag - Fail fast on docker cp failures in refresh_db() (main DB required, WAL/SHM optional) - Use sqlite3 context manager and catch specific sqlite3.Error in polling loop - Raise on Oracle query failures instead of silently skipping tables
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/dbz-twin/rac/db-check.py (1)
333-343: Use context manager for sqlite3 connection for consistency.Unlike
wait_for_sentinelwhich properly useswith sqlite3.connect(...) as conn:, this connection is manually opened and closed. If an exception occurs duringreplay_events, the connection won't be properly closed.♻️ Suggested fix
# Replay - conn = sqlite3.connect(sqlite_path) - conn.row_factory = sqlite3.Row - - print("\n--- Replaying LM events ---") - lm_state = replay_events(conn, 'lm_events') - print(f" LM replay rows: {len(lm_state)}") - - print("\n--- Replaying OLR events ---") - olr_state = replay_events(conn, 'olr_events') - print(f" OLR replay rows: {len(olr_state)}") - conn.close() + with sqlite3.connect(sqlite_path) as conn: + conn.row_factory = sqlite3.Row + + print("\n--- Replaying LM events ---") + lm_state = replay_events(conn, 'lm_events') + print(f" LM replay rows: {len(lm_state)}") + + print("\n--- Replaying OLR events ---") + olr_state = replay_events(conn, 'olr_events') + print(f" OLR replay rows: {len(olr_state)}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/db-check.py` around lines 333 - 343, Replace the manual sqlite3 connection with a context manager to ensure the connection is always closed on exceptions: use "with sqlite3.connect(sqlite_path) as conn:" (keep setting conn.row_factory = sqlite3.Row) and move the calls to replay_events(..., 'lm_events') and replay_events(..., 'olr_events') and the print statements inside that with-block so lm_state and olr_state are produced while the connection is open and automatically closed afterward; reference symbols: sqlite3.connect, conn.row_factory, replay_events, lm_state, olr_state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/rac/db-check.py`:
- Around line 140-189: In replay_events, the CDC table name is used without
normalization which can mismatch the uppercase TABLES and compare_states
lookups; update the assignment of table (row['table_name']) inside the
replay_events function to normalize to uppercase (e.g., set table =
row['table_name'].upper()) so keys stored in state use the same uppercase table
name as column normalization and the TABLES constant.
---
Nitpick comments:
In `@tests/dbz-twin/rac/db-check.py`:
- Around line 333-343: Replace the manual sqlite3 connection with a context
manager to ensure the connection is always closed on exceptions: use "with
sqlite3.connect(sqlite_path) as conn:" (keep setting conn.row_factory =
sqlite3.Row) and move the calls to replay_events(..., 'lm_events') and
replay_events(..., 'olr_events') and the print statements inside that with-block
so lm_state and olr_state are produced while the connection is open and
automatically closed afterward; reference symbols: sqlite3.connect,
conn.row_factory, replay_events, lm_state, olr_state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9853f60-b9f6-4b1e-9c37-2a75ba7a3c98
📒 Files selected for processing (1)
tests/dbz-twin/rac/db-check.py
Aligns with TABLES constant and Oracle query results which are uppercase. Prevents silent mismatch if CDC events use mixed-case table names.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/dbz-twin/rac/db-check.py (2)
333-343: Use context manager for SQLite connection in main().Similar to the pattern used in
wait_for_sentinel, wrap the connection in a context manager to ensure cleanup if an exception occurs during replay.♻️ Proposed fix
# Replay - conn = sqlite3.connect(sqlite_path) - conn.row_factory = sqlite3.Row - - print("\n--- Replaying LM events ---") - lm_state = replay_events(conn, 'lm_events') - print(f" LM replay rows: {len(lm_state)}") - - print("\n--- Replaying OLR events ---") - olr_state = replay_events(conn, 'olr_events') - print(f" OLR replay rows: {len(olr_state)}") - conn.close() + with sqlite3.connect(sqlite_path) as conn: + conn.row_factory = sqlite3.Row + + print("\n--- Replaying LM events ---") + lm_state = replay_events(conn, 'lm_events') + print(f" LM replay rows: {len(lm_state)}") + + print("\n--- Replaying OLR events ---") + olr_state = replay_events(conn, 'olr_events') + print(f" OLR replay rows: {len(olr_state)}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/db-check.py` around lines 333 - 343, The SQLite connection in main currently uses sqlite3.connect(sqlite_path) and manually calls conn.close(); change it to use a context manager so the connection is always closed on exceptions: open the DB with "with sqlite3.connect(sqlite_path) as conn:" (set conn.row_factory = sqlite3.Row inside the with block), call replay_events(conn, 'lm_events') and replay_events(conn, 'olr_events') inside that block, and remove the explicit conn.close() call; keep using the same identifiers (sqlite3.connect, conn.row_factory, replay_events, 'lm_events', 'olr_events') so behavior is unchanged but cleanup is automatic, consistent with wait_for_sentinel.
201-236: Use context manager for Oracle connection to ensure cleanup on exceptions.If an exception occurs after
oracledb.connect()but beforeconn.close()(e.g., during row processing at lines 219-233), the connection leaks.♻️ Proposed fix using context manager
- conn = oracledb.connect(dsn) - cursor = conn.cursor() state = {} failures = [] - for table_name, pk_col, compare_cols in TABLES: - all_cols = list(dict.fromkeys([pk_col, 'EVENT_ID'] + compare_cols)) + with oracledb.connect(dsn) as conn: + cursor = conn.cursor() + + for table_name, pk_col, compare_cols in TABLES: + all_cols = list(dict.fromkeys([pk_col, 'EVENT_ID', *compare_cols])) ... - cursor.close() - conn.close() + cursor.close() + if failures: raise RuntimeError(f"Oracle query failures: {failures}") return state🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/db-check.py` around lines 201 - 236, The code leaks the Oracle connection/cursor if an exception occurs after oracledb.connect(); fix by using context managers: replace manual conn = oracledb.connect(dsn) / cursor = conn.cursor() and explicit close() calls with "with oracledb.connect(dsn) as conn:" and nested "with conn.cursor() as cursor:" around the loop over TABLES so both conn and cursor are automatically closed on exceptions; preserve the existing try/except that captures query errors (failures.append((table_name, str(e)))) inside the new with-block and keep building state using the same col_names/row handling logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/dbz-twin/rac/db-check.py`:
- Around line 333-343: The SQLite connection in main currently uses
sqlite3.connect(sqlite_path) and manually calls conn.close(); change it to use a
context manager so the connection is always closed on exceptions: open the DB
with "with sqlite3.connect(sqlite_path) as conn:" (set conn.row_factory =
sqlite3.Row inside the with block), call replay_events(conn, 'lm_events') and
replay_events(conn, 'olr_events') inside that block, and remove the explicit
conn.close() call; keep using the same identifiers (sqlite3.connect,
conn.row_factory, replay_events, 'lm_events', 'olr_events') so behavior is
unchanged but cleanup is automatic, consistent with wait_for_sentinel.
- Around line 201-236: The code leaks the Oracle connection/cursor if an
exception occurs after oracledb.connect(); fix by using context managers:
replace manual conn = oracledb.connect(dsn) / cursor = conn.cursor() and
explicit close() calls with "with oracledb.connect(dsn) as conn:" and nested
"with conn.cursor() as cursor:" around the loop over TABLES so both conn and
cursor are automatically closed on exceptions; preserve the existing try/except
that captures query errors (failures.append((table_name, str(e)))) inside the
new with-block and keep building state using the same col_names/row handling
logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 563ae4b4-37f8-4e40-a563-a8809338c4cd
📒 Files selected for processing (1)
tests/dbz-twin/rac/db-check.py
- Hoist skip-lob-tables check outside the || fallback path at all 5 emit/skip decision points in Builder.cpp. Previously SHOW_SYSTEM_TRANSACTIONS or SCHEMALESS flags would bypass the skip and still emit LOB table events. - Add missing db-check.py to the repository (was untracked).
Summary
skip-lob-tablesformat option to skip DML events for tables with LOB columnsChanges
OLR
skip-lob-tablesformat option (Format.h,Builder.cpp,OpenLogReplicator.cpp): When set to1, silently skips INSERT/UPDATE/DELETE for tables with CLOB/BLOB/NCLOB columns. Covers all 5 emit/skip decision points in Builder.cpp.4.format.adocupdated with new option;LOB-PHANTOM-INVESTIGATION-REPORT.mddocuments root cause and signals investigated.Fuzz test framework
action_upnow gets the current SCN after table deployment and pre-seeds offset topics, ensuring no stale events from previous runs.snapshot.mode=recovery: Rebuilds schema history from the pre-seeded offset without re-snapshotting data.FUZZ-TEST.mdandAGENTS.mdupdated with clean run procedure.Test results
Non-LOB accuracy (SKIP_LOB=1, 10-minute RAC fuzz):
Redo log regression: 161 passed.
Related
Test plan
make test-redo— 161 passedSKIP_LOB=1 ./fuzz-test.sh run 10+ validate + db-check — 100% matchskip-lob-tables=1verified: 0 new LOB events from OLRSummary by CodeRabbit
New Features
Documentation
Tests